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Energy  management  of  a  fuel  cell/ultracapacitor  hybrid  power  system  aims  to  optimize  energy  effi¬ 
ciency  while  satisfying  the  operational  constraints.  The  current  challenges  include  ensuring  that  the 
non-linear  dynamics  and  energy  management  of  a  hybrid  power  system  are  consistent  with  state  and 
input  constraints  imposed  by  operational  limitations.  This  paper  formulates  the  requirements  for  energy 
management  of  the  hybrid  power  system  as  a  constrained  optimal-control  problem,  and  then  trans¬ 
forms  the  problem  into  an  unconstrained  form  using  the  penalty-function  method.  Radial-basis-function 
networks  are  organized  in  an  adaptive  optimal-control  algorithm  to  synthesize  an  optimal  strategy  for 
energy  management.  The  obtained  optimal  strategy  was  verified  in  an  electric  vehicle  powered  by  com¬ 
bining  a  fuel-cell  system  and  an  ultracapacitor  bank.  Driving-cycle  tests  were  conducted  to  investigate 
the  fuel  consumption,  fuel-cell  peak  power,  and  instantaneous  rate  of  change  in  fuel-cell  power.  The 
results  show  that  the  energy  efficiency  of  the  electric  vehicle  is  significantly  improved  relative  to  that 
without  using  the  optimal  strategy. 

©  2010  Elsevier  B.V.  All  rights  reserved. 


1.  Introduction 

A  fuel-cell  hybrid  power  system  (FCHPS)  for  an  electric  vehicle 
augments  the  fuel  cell  with  a  reversible  energy  storage  system  (ESS) 
so  that  the  overall  system  can  cope  with  the  power  demands  of  the 
vehicle.  The  ESS  can  be  implemented  with  either  an  ultracapaci¬ 
tor  bank  or  a  rechargeable  battery  [1-3]— this  work  considers  the 
ultracapacitor-based  ESS.  The  chief  merit  of  this  technology  is  that 
the  power-capacity  rating  of  the  fuel-cell  system  (FCS)  is  required 
to  meet  the  average  demand  only,  rather  than  the  peak  demand. 
This  makes  the  FCHPS  more  cost-effective  and  energy-efficient  than 
using  the  fuel  cell  alone  in  powering  the  vehicle.  Secondly,  rapid 
load  variations  may  induce  oxygen  starvation  and  thereby  cause 
permanent  damage  to  the  proton-exchange  membranes  of  the  fuel 
cell.  In  contrast,  the  ultracapacitor  exhibits  superior  performance 
in  providing  peak  power,  despite  its  significantly  low  energy  den¬ 
sity.  Combining  an  FCS  and  an  ultracapacitor  bank  can  provide  a 


Abbreviations:  FCHPS,  fuel-cell  hybrid  power  system;  ESS,  energy  storage 
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power  system  with  both  high  power  and  energy  densities.  Thirdly, 
the  FCS  generates  electric  power  directly  from  hydrogen,  but  a 
reverse  power  flow  is  impossible ;  the  ultracapacitor  bank  therefore 
provides  a  reservoir  for  the  regenerative  use  of  electricity. 

As  shown  in  Fig.  1,  the  FCHPS  uses  DC/DC  converters  to  inter¬ 
face  various  types  of  power  devices.  This  may  be  accomplished 
using  a  unidirectional  boost  converter  that  interfaces  with  the  FCS 
and  the  DC  bus,  and  protects  the  FCS  from  damage  by  reverse 
current.  Alternatively,  the  ESS  may  employ  a  bidirectional  buck- 
boost  converter  to  allow  bidirectional  power  flow,  which  allows 
the  ESS  to  not  only  deliver  the  shared  peak  power  but  also  capture 
the  electricity  for  regenerative  utilization.  The  FCHPS  employs  an 
energy-management  strategy  (EMS)  to  achieve  an  optimal  power 
split  between  distinct  power  sources  that  improves  the  energy  uti¬ 
lization.  Based  on  a  model-predictive  control  methodology,  Vahidi 
et  al.  [4]  developed  a  current-management  strategy  to  avoid  the 
problems  of  oxygen  starvation,  air-compressor  surge  and  choke 
in  FCSs.  Chen  et  al.  [5]  used  multiple-model  predictive  control  to 
optimize  the  power  usage  and  the  control  of  oxygen.  Zhu  et  al. 
[6]  adopted  a  cluster-weighted  modelling  algorithm  to  identify 
load  transients  and  determine  the  optimal  power  split  between 
the  FCS  and  ESS.  A  transient-load  recognition  technique  based  on 
wavelet-transform  algorithms  for  hybrid  energy  sources  (includ¬ 
ing  a  fuel  cell,  battery  and  ultracapacitor)  was  proposed  in  [7]. 
Jiang  [8]  investigated  using  an  agent-based  power-sharing  method 
for  implementing  a  distributed  control  scheme  when  combin¬ 
ing  multiple  power  sources.  An  EMS  based  on  fuzzy  logic  was 
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FCHPS 


Fig.  1.  Architecture  of  the  FCHPS. 


studied  in  [9-12].  Moreover,  an  EMS  may  be  derived  with  the 
help  of  optimization  theory.  Based  on  the  equivalent  consumption- 
minimization  strategy,  Paganelli  et  al.  [13]  and  Rodatz  et  al.  [14] 
presented  a  local  optimal  scheme  that  evaluated  the  cost  function 
based  on  the  hydrogen  consumption  and  the  equivalent  fuel  con¬ 
sumption  of  the  ESS.  Feroldi  et  al.  [15]  formulated  an  EMS  as  a  local 
optimization  problem  subject  to  a  set  of  operational  constraints, 
and  used  a  constrained  non-linear  programming  method  to  obtain 
a  solution.  Delprat  et  al.  [16]  and  Bernard  et  al.  [17]  obtained  the 
EMS  by  the  forward  iteration  of  a  state  equation,  a  costate  equation 
and  a  stationary  equation  while  assuming  that  the  initial  costate 
vector  is  available.  This  approach  provides  a  non-causal  solution 
because  the  entire  driving  cycle  must  be  known  before  obtaining 
the  initial  costate  vector,  and  the  convergence  of  this  algorithm  is 
sensitive  to  parameter  errors. 

In  the  present  work,  the  requirements  for  the  EMS  are  formu¬ 
lated  as  an  optimal-control  problem  subject  to  a  set  of  state  and 
input  constraints  imposed  by  operational  limitations.  The  penalty- 
function  method  [18]  is  then  used  to  transform  the  constrained 
optimal-control  problem  into  an  unconstrained  form.  An  adaptive 
optimal-control  (AOC)  algorithm  is  subsequently  developed  and 
customized  for  synthesizing  an  optimal  EMS.  The  AOC  algorithm 
is  deduced  from  the  minimum  principle  of  optimal  control,  rather 
than  by  manipulating  the  Bellman  equation,  as  in  adaptive  dynamic 
programming  [19-21].  Radial-basis-function  (RBF)  networks  are 
employed  when  constructing  the  AOC  algorithm.  The  obtained 
optimal  EMS  was  evaluated  in  the  application  of  an  electric  vehi¬ 
cle  powered  by  an  FCHPS.  The  results  from  driving-cycle  tests 
demonstrate  the  effectiveness  of  the  optimal  EMS.  Here  we  focus 
on  formulating  a  constrained  optimal-control  problem  for  energy 
management  of  an  FCHPS  and  on  developing  an  AOC  methodology 
for  synthesizing  an  optimal  EMS  through  reinforcement  learning. 

This  paper  is  organized  as  follows.  Section  2  derives  a  model 
for  the  FCHPS,  and  the  requirements  of  its  EMS  are  formulated  as 
a  constrained  optimal-control  problem.  Section  3  develops  an  AOC 
algorithm  that  is  capable  of  synthesizing  an  optimal  EMS  by  learn¬ 
ing  training  data.  Section  4  investigates  the  obtained  optimal  EMS 
with  driving-cycle  tests  of  an  electric  vehicle  powered  by  an  FCHPS. 
Finally,  conclusions  are  drawn  in  Section  5. 

2.  Formulation  of  the  energy-management  problem 

2.1.  Model  for  the  FCHPS 

A  mathematical  model  for  the  FCS  can  be  deduced  from  physi¬ 
cal  laws  and  the  operating  conditions.  If  the  gas  pressure,  ambient 
temperature  and  humidity  are  all  well-regulated  inside  the  FCS,  its 
characteristic  curve  of  efficiency  versus  net  power  would  exhibit 


Fig.  2.  Efficiency  of  a  50-kW  fuel  cell. 


the  result  shown  in  Fig.  2.  The  power  efficiency  is  low  when  the 
FCS  operates  in  a  low-power  mode  since  the  peripheral  systems  of 
the  FCS  consume  an  amount  of  power  in  maintaining  the  function¬ 
ality  of  the  overall  system.  In  addition,  the  power  efficiency  reduces 
in  a  high-power  mode  due  to  the  physical  nature  of  fuel-cell  stacks. 
Fig.  2  shows  the  overall  efficiency  of  a  50-kW  fuel-cell  module  as 
calculated  using  ADVISOR  software  [22]. 

However,  knowledge  of  the  overall  efficiency  is  not  sufficient  for 
assessing  the  performance  of  an  FCS,  since  the  low  efficiency  in  the 
low-power  mode  (Fig.  3)  is  not  significant  due  to  the  low  absolute 
power  loss.  Therefore,  we  gauged  the  performance  of  the  FCHPS  by 
deriving  the  hydrogen  consumption  from  the  efficiency  of  the  fuel 
cell  [11]  as  (see  Fig.  3): 


mU2 


PFc 

LHV^fc(Pfc) 


x  105%, 


(1) 


where  mH2  is  the  hydrogen-fuel  consumption  rate  (gs-1), 
LHV  =  1 2,000  kj  g-1  is  the  lower  heating  value  of  hydrogen  and  the 
additional  5%  allows  for  the  assumed  loss  of  hydrogen  due  to  the  FCS 
purging  mechanism.  A  performance  model  of  the  FCS  is  obtained 
by  fitting  the  hydrogen-consumption  curve  in  Fig.  3  with  the  third- 
order  polynomial  function. 


^fit(^Fc)  =  a3^FC  +  fl2  •  ?FC  +  al  '  ^FC  +  a0- 


(2) 


Table  1  lists  the  fitting  parameters  in  Eq.  (2)  with  rhfit  in  units  of 
gs-1  and  PFC  in  units  of  lOOkW  (this  unit  is  adopted  for  power 
to  ensure  that  the  polynomial  fitting  function  does  not  contain 
extremely  large  or  small  coefficients).  An  appropriate  parameter 
set  can  facilitate  the  convergence  of  the  optimal  solution  search. 
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Fig.  3.  Hydrogen  consumption  of  the  50-kW  fuel  cell. 
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Fig.  5.  Charge-discharge  contour  plot  of  an  ultracapacitor  bank. 


Table  1 

Curve-fitting  parameters  for  a  performance  model. 


Parameter 

a3 

a2 

a-i 

cio 

Value 

4.1953 

-2.0662 

1.6944 

0.0066 

charging  power  and  charging  power  to  have  positive  and  negative 
values,  respectively.  The  parasitic  resistance  (Ruc)  results  in  signif¬ 
icant  power  loss  during  charging  and  discharging,  and  leads  to  a 
charge-discharge  contour  plot  that  takes  the  form 


The  ultracapacitor  behaves  similarly  to  a  conventional  capacitor. 
Previous  studies  have  shown  that  an  ultracapacitor  can  be  modelled 
as  an  equivalent  circuit  with  constant  parameters  [3,23-25].  Fig.  4 
shows  the  equivalent  circuit  model  used  in  this  study.  According 
to  this  model,  the  energy  (E)  stored  in  the  ultracapacitor  is  linearly 
proportional  to  the  square  of  the  capacitor  voltage.  The  maximum 
energy  (Emax)  that  can  be  stored  in  an  ultracapacitor  is  limited  by 
its  rated  voltage  (pmax)  according  to 


E  = 


,2 

ucvc 


CiirV, 


(3) 


Pc  ( SoC,  Pess)  = 


2  RucCuc 
SoC  •  Emax 


Pess 


SoC  •  Emax 
Ruc  Cue 


(6) 


Fig.  5  shows  the  deduced  charge-discharge  contour  plot  of  an 
ultracapacitor.  The  dynamics  of  an  ultracapacitor  can  be  modelled 
by  a  first-order  difference  equation: 

SoC(fc+l)  =  SoC(k)-Pc(SoC(k),PESs(k))-  AC  (7) 

bmax 

where  k  (with  /<  =  0,  1,  2, . . .)  is  the  index  of  the  time  steps  and  AT 
is  the  sampling  period. 


E 


max 


1 

2 


CucV  max' 


(4) 


The  state  of  charge  (SoC)  of  an  ultracapacitor  is  defined  as  the  ratio 
of  E  to  Emax- 

SoC  =  -=^—.  (5) 

£max 


2.2.  Objective  of  energy  management 

Manipulating  an  FCF1PS  involves  controlling  the  DC/DC  convert¬ 
ers  and  managing  how  power  is  split  among  distributed  power 
sources.  The  converter-control  loop  typically  responds  100-1000 
times  faster  than  the  energy-management-strategy  loop.  There¬ 
fore,  as  shown  in  Fig.  6,  the  entire  EMS  of  the  FCF1PS  can  be  divided 


As  shown  in  Fig.  4,  Pess  denotes  the  power  flow  at  the  termi¬ 
nal  nodes  of  the  ultracapacitor,  and  Pc  denotes  the  power  passing 
through  internal  capacitor  Cuc.  The  convention  used  here  is  for  dis- 


o 


Fig.  4.  Model  of  an  ultracapacitor  as  an  RC  equivalent  circuit. 


Fig.  6.  Levels  of  the  EMS  and  VCC  in  an  FCHPS. 
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into  a  bottom  level  and  a  top  level  associated  with  the  fast-  and 
slow-responding  loops,  respectively.  The  bottom  level  is  respon¬ 
sible  for  the  stable  control  of  voltage  and  current  for  the  DC/DC 
converters;  at  this  level,  the  voltage-and-current  controller  (VCC) 
acts  as  an  interface  between  the  top  level  and  the  physical  compo¬ 
nents  of  the  FCHPS.  The  top  level  implements  the  EMS  that  directs 
the  FCHPS  to  minimize  the  cost  function. 

Methods  used  to  control  DC/DC  converters  are  described  in 
[26-28].  In  the  present  study  it  was  assumed  that  the  converter- 
control  loop  performs  so  effectively  that  the  steady-state  errors 
and  transient  behaviours  of  the  DC/DC  converters  can  be  neglected. 
Thus,  the  transfer  function  of  the  VCC  was  approximated  as  a  con¬ 
stant  value  of  1 ,  allowing  the  problem  of  energy  management  of  the 
FCHPS  to  be  formulated  as  a  constrained  optimal-control  problem. 
Using  the  polynomial  model  of  Eq.  (2)  of  the  hydrogen  consump¬ 
tion,  a  cost  function  for  assessing  the  performance  of  the  FCHPS  can 
be  written  as 

K- 1 

/  =  <P{K,SoC{K ))  +  ^>h2(PfcW)  •  AT,  (8) 

k= 0 

where  I<  denotes  the  final  step  of  the  test  cycle,  and 

<£(/C,SoC(/C))  =  (SoC(K)  -  SoC(0))tPk(SoC(K )  -  SoC(O)),  (9) 

which  minimizes  the  difference  between  the  final  SoC  [SoC(/0]  at 
the  end  of  the  driving  cycle  and  the  initial  SoC  [SoC(O)].  This  restric¬ 
tion  ensures  that  the  vehicle  is  on  average  powered  by  the  FCS.  This 
cost  function  accounts  for  the  total  hydrogen  consumption  during 
an  entire  driving  cycle.  The  goal  of  energy  management  is  to  mini¬ 
mize  the  cost  function  of  Eq.  (8),  subject  to  the  constraints  imposed 
by  the  ultracapacitor  dynamics  given  by  Eq.  (7)  and  the  operational 
limitations  of  the  FCS  and  the  ESS. 

The  principle  of  energy  conservation  means  that  the  following 
equality  constraint  holds  for  this  power  system: 

Pfc(/<)?7i  +PessWj?2  =  PlW,  (10) 

where  r)\  and  r]2  are  the  efficiencies  of  DC/DC  converters.  In  real- 
world  applications,  the  operating  ranges  of  the  FCS  and  the  ESS 
should  be  appropriately  restricted  so  as  to  protect  the  high-power 
equipment  from  damage.  The  lower  power  limitation  (PFcmin  >  0), 
is  applicable  in  the  FCS  because  it  cannot  store  electrical  energy. 
The  FCS  may  be  damaged  permanently  or  it  may  degrade  rapidly 
if  any  reverse  current  occurs,  and  hence  PFcmin  must  be  sufficiently 
greater  than  0  to  avoid  violating  this  constraint.  When  the  FCS  is 
operating  at  a  high  power,  the  maximum  power  that  can  be  drawn 
is  limited  to  the  rated  value,  PFCmax,  since  excessive  output  power 
may  lead  to  oxygen  starvation  that  would  damage  the  FCS.  Thus, 
the  inequality  constraint  on  the  FCS  power  takes  the  form 


Pfc 


min 


<  PFC(/<)  <  PFCmax* 


(ID 


In  addition  to  the  need  to  consider  the  magnitude  of  the  FCS 
power,  the  rate  of  change  of  the  FCS  power  also  should  be  lim¬ 
ited  to  ensure  that  the  air  compressor  (which  has  a  slow  dynamic 
response)  can  cope  with  the  power  fluctuations.  Thus,  the  second 
inequality  constraint  should  be  of  the  form 


APfC,  fall 


Pfc(P)  —  APFC.rise 


(12) 


Based  on  Eqs.  (7)-(13),  the  energy-management  problem  can  be 
formulated  as  a  finite-time,  constrained  optimal-control  problem 
(see  Problem  1). 


Problem  1. 


min  J'(x(0),  U) 

{umU 


min 

{umto 


K- 1 


<P{K,  x{I<))  +  ^mH2(u(fe))  ■  AT 


k= 0 


subject  to 


*(/<  +  !)  =/'(*(/<)), 


h(x,  u)  -  0, 
and 


g/(x,u)>0  (i  =  1,  2, . . . ,  6), 

where  x{k)  =  So C(/<),  u(/<)  =  PFC(/<),  U  is  a  history  (policy)  of  u{k)  and 
k  denotes  the  final  time.  The  state  equation  corresponds  to  Eq.  (7), 
equality  constraint  /i(x,  u)  =  0  corresponds  to  the  constraint  in  Eq. 
(10)  and  the  inequalities  in  Eqs.  (11 )— (13)  are  reformulated  in  the 
form  of  gi(x,  u)>  0  as  follows: 


g\ 

g2 

g3 

g4 

g5 

g6 


u(k)  -  PFcmin  >  0, 

PFCmax  _  u(fe)  -  °> 

u{k)  -  u(k  -  1)  -  APpQfan  AT  >  0, 

APpc.rise  AT  -  u{k)  +  u(k  -  1 )  >  0, 

x{k)  -  SoCmin  >  0, 

SoCmax  —  x(k)  >  0. 


2.3.  Transformation  of  constrained  optimal  control 


The  penalty-function  method  [18]  and  Bellman’s  principle  of 
optimality  [29]  can  be  used  to  transform  Problem  1  into  an  infinite¬ 
time,  unconstrained  optimal-control  problem  (see  Problem  2). 


Problem  2. 


K- 1 


min  J(x(0),  U)  =  min  <  J*(x(K))  +  V^L(k) 

{«(*)}£  0  1 


M/dlLo 


k= o 


=  min  |j*(x(K))+J'(x(0),l/) 


6  K- 1 


+ 


u(k),  k)H(gi)  l ,  (15) 


i= 1  k= 0 


subject  to  x(k+ 1 )  =/(x(/<)),  where  J*(x(I<))  =  @{K,  x{I<))  (see  Eq.  (9)), 


H(&) 


0,  if  gt  >  0 
1 ,  if  gi  <  0  ’ 


Theoretically,  the  upper  bound  of  the  SoC  (SoCmax)  of  the  ESS  is  1, 
but  in  practice  this  value  is  set  somewhat  smaller  so  as  to  pro¬ 
vide  a  margin  for  safety.  The  lower  bound  of  the  SoC  (SoCmin) 
is  required  because  the  power-conversion  efficiency  of  the  buck- 
boost  converter  is  rather  poor  when  the  ESS  voltage  is  extremely 
low.  Therefore,  the  third  inequality  constraint  on  the  operation  of 
FCHPS  takes  the  form 

SoCmin  <  SoC(/<)  <  SoCmax*  (13) 


and  Si  is  the  penalty  coefficient. 

Substituting  the  equality  constraint  of  Eq.  (10)  into  the  ultraca¬ 
pacitor  model  of  Eq.  (7)  yields 

x(k  +  l)=/(x(k))  =x{k)  +  ir(x{k),u(k),PL(k)),  (16) 

where 

xlf(x(k),u(k),PL(k))  =  -Pc  (x(k),  \  .  (17) 

V  P2  J  £max 
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The  inequality  constraints  in  Problem  1  are  transformed  into  a  set 
of  penalty  functions  to  form  the  adjoined  cost  function  in  Problem 

2.  This  transforms  the  EMS  problem  into  solving  an  infinite-time, 
unconstrained  minimization  problem,  and  its  solution  is  expected 
to  weakly  converge  towards  an  optimal  solution  to  Problem  1  [18]. 

3.  Synthesis  of  an  EMS  with  the  AOC  algorithm 

3.1.  Necessary  conditions  for  optimality 

Considering  the  minimization  problem  of  the  adjoined  cost 
function  (J(u))  subject  to  the  ultracapacitor  model  in  Eq.  (16),  and 
introducing  a  costate  variable  (A ,(/<))  makes  it  possible  to  transform 
this  problem  into  the  minimization  of  the  Hamiltonian  without  any 
constraint  [29].  The  Hamiltonian  is  defined  as 

Hk±Uk)+f(k)k(k  +  n  (18) 

where /(/<)  =/(/<,  x(/<),  u{k)).  The  necessary  conditions  for  optimality 
are  obtained  from  the  minimum  principle  as  follows: 


and 


dm  ( dm 

dx(k)  \dx{k) 


T 

Kk  + 1)  + 


T 

S(k). 


(25) 


It  is  evident  that  optimality  requires  S(k)  =  0  and  A.D(/<)  =  A.*(/<).  Let 
l  be  the  index  for  the  version  of  the  synaptic  weights.  Substituting 
the  neural  networks  for  corresponding  terms  in  Eqs.  (24)  and  (25) 
yields 


m 


and 

m 


dm  ( sm 

du{k)  \du{k) 


T 

X°{k  +  1  |cq) 


u{k)=u(k\Wi ) 


XD(k)-X{k  |ce,) 


m)  ( a m 

dx(k)  \dx{k) 


T 

X°(k  +  1 1  a/) 


(26) 


f  3u(k|W/+1 ) 

V  dx(k ) 


T 

8{k)  -  X(k|cq), 


(27) 


x(/<  + !)=/(/<), 


m)  ( dm 

du{1<)  \du{k) 


T 

X*{k  +  1), 


(19)  where  <$(/<)  denotes  the  action  residual  and  e(k)  is  the  critic  error. 
These  two  quantities  result  from  non-optimality  or  from  approxi- 

(20)  mation  errors  in  neural  networks. 


(21) 

(22) 


o nK,x(I<))\Tzr^  „ 

(VMK)~  m)  )  (  )=a  (23) 

where  ‘*’  denotes  an  optimal  value,  x{I<)  and  x(0)  are  arbitrary  vec¬ 
tors  corresponding  to  the  variations,  and  J*(K,  x(K))  is  the  optimal 
terminal  cost  function.  The  AOC  method  assumes  that  initial  state 
x(0)  is  either  fixed  or  free,  and  that  final  state  x(K)  is  free.  Thus, 
A,*(0)  =  0  may  satisfy  the  initial-point  condition  in  Eq.  (22),  and  the 
end-point  condition  of  Eq.  (23)  gives  A  *(K)  =  d >J*(K,  x(K))/dx(K). 


3.3.  Action-network  improvement  routine 

The  adjoined  cost  function  in  Problem  2  can  be  written  as  a 
recurrence  relation: 

J(k,  x(k),  U)  =  L{1<)  +J{k  +  1 ,  x(k  +  1 ),  tf),  (28) 

where  U  denotes  a  history  of  u{1<).  Bellman’s  principle  of  optimality 
[29]  is  used  to  obtain 

min  J(k,  x(/<),  U)  =  min{L(/<)  +  J*(k  +  1 ,  x(k  +  1 ))}.  (29) 

{w(fe)}£0  u(k) 

The  action-network  improvement  routine  is  designed  to  minimize 
the  cost  function  in  the  minimization  operation.  The  gradient  of  the 
adjoined  cost  function  with  respect  to  w  is 

dj(k)  =  dL{k)  t  dj*(k+  1) 
dw  d  w  dw 


3.2.  The  AOC-EMS  system 


Solving  the  optimality  conditions  for  an  optimal  EMS  is  chal¬ 
lenging  since  the  adjoined  cost  function,  J(u),  is  not  in  quadratic 
form  and  the  dynamical  constraint  (i.e.,  state  equation)  is  non¬ 
linear.  While  linear  optimal-control  theory  is  unable  to  provide 
an  analytical  solution  to  this  problem,  the  AOC  method  may  be 
implemented  with  neural  networks  to  obtain  an  optimal  strategy. 
Fig.  7  shows  a  block  diagram  of  the  AOC-EMS  system.  The  AOC 
algorithm  consists  of  three  blocks:  the  action  network,  fi(/c|w);  the 

A  A 

critic  network,  A .°(k  +  1  |a);  and  the  shadow-critic  network,  X(k\a)\ 
where  w  and  a  denote  the  synaptic  weights  of  neural  networks.  The 
shadow-critic  network  estimates  the  optimal  costate  at  the  present 
time  step,  and  the  critic  network  projects  this  quantity  to  the  next 
time  step.  The  action  network  is  responsible  for  the  EMS  while 
the  critic  and  shadow-critic  networks  facilitate  an  adaptive  critic 
mechanism  to  guide  the  improvement  of  the  action  network  using 
the  reinforcement-learning  methodology.  The  critic  and  shadow- 
critic  networks  have  identical  topology  and  synaptic  weights  but 
different  inputs  and  outputs. 

The  optimality  conditions  in  Eqs.  (20)  and  (21)  may  not  hold 
away  from  an  optimal  trajectory.  In  fact,  we  have 


m 


dm  ( dm 

du(k)  \du{k) 


T 

X(k  + 1), 


(24) 


dm  ( dm 

du{1<)  \du{k) 


A.*(k  +  1) 


(30) 


where  A  *(k)  =  d J*(k ,  x(k))/3x(k)  and  J{k)  =  J{k,  x(/<),  U).  Substituting 
the  network  output  for  the  corresponding  terms  yields 


dj(k) 

dw 


9fi(fc|w)\r  f  dL(k)  ffflk) 
dw  J  |  du(k)  ydu(k) 


T 

X°(k  +  1  \a) 


u(k)=u(k\w ) 

(31) 


Using  the  gradient-descent  method,  the  generalized  delta  rule 
for  updating  the  action  network  is 


w/+1  =  w,  +  Aw,, 
where 

Aw,  =  ftAw(_,  -  /la 


dKk) 

dw 


(32) 


(33) 


W=W; 


Substituting  the  gradient  in  Eq.  (31)  into  Eqs.  (32)  and  (33)  yields 
the  action-network  updating  rule: 


Awj  =  paAW/_1  -  lla 


fW1)  m 


(34) 


W=W; 
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Power  flow  - ►  Signals 


►  Signals  (only  used  for  training) 


Fig.  7.  The  AOC-EMS  system. 


where  pa  denotes  the  momentum  coefficient  and  Ma  denotes  the 
learning  rate  of  the  action  network. 

3.4.  Critic-network  learning  routine 

The  shadow-critic  network  is  updated  using  a  supervised  learn¬ 
ing  method  by  minimizing  the  instantaneous  sum  of  squared  critic 
errors  at  time  step  k : 

Ec(k)=yT(mk)=^\\kD(k)-ma)\\l  (35) 


3.5.  Plant  model  and  Jacobian  quantities 

As  illustrated  in  Fig.  7,  a  plant  model  is  needed  to  predict  the 
step-ahead  values  of  the  plant  states  for  computing  the  critic- 
network  outputs.  Using  the  plant  model  in  Eq.  (16)  allows  the 
Jacobian  quantities  appearing  in  Eqs.  (26)  and  (27)  to  be  written 
as  follows: 

df  (x(k),  u(k))  = _ [rp  AT/y^gmax) _ 

dUW  ~  \A  -  (2R„cCuc/£max)  ■  m(k)  -  I mfMkM)  ’ 

(38) 


Using  the  gradient-descent  method  yields  the  generalized  delta 
rule  for  updating  the  shadow-critic  network: 


cq+1  =  oi[  +  Acq, 


(36) 


with 


a/(x(/4  u(k)) 
dx{k) 


WM)  -  u(k)yyi/x(k)yy2)(Ar/Emax) 
y/l  -  (2RucCUc/£max)  •  ((Pi (k)  -  u(k)jj,  )/x(k)j? 2) 


PL(fc)  ~  U(fc)>?1  \ 
x(k)r]2  J 


AT 
Rue  Cue 


(39) 


A  a/  =  pcAo'/_1 


Me 


ggcW 

3of 


o,=a; 


PcAof/_^  +  Me 


3A.(k|af) 

9of 


r 


e(k), 

a=ai 

(37) 


where  pc  is  the  momentum  coefficient,  Me  is  the  learning  rate  of  the 

A 

shadow-critic  network  and  gradient  3A(/<|of)/3of  can  be  obtained 
from  the  back-propagation  of  the  shadow-critic  network.  The 
shadow-critic  network  receives  updates  stepwise  (as  in  the  instan¬ 
taneous  mode),  while  the  critic  network  obtains  a  duplicate  of  the 

A 

updated  weights  in  turn.  In  calculating  Eq.  (27),  the  term  X°(k  +  1  |of) 
is  taken  from  the  critic  network  for  k+ 1  =  1,  . . .,  I<-  1,  and  AD(/C) 
at  the  final  step  is  substituted  by  the  end-point  condition;  that  is, 
k*{K)  =  dr(Ktx(K))ldx(K). 


Fig.  8.  A  function  that  provides  a  smooth  penalty  transition. 
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Fig.  9.  Architecture  of  the  three-layer  RBF  network. 


The  derivatives  of  Lagrangian  L{k)  in  adjoined  cost  function  J{k) 
with  respect  to  u(k)  and  x(k)  are  as  follows: 


dl{k) 

Mk) 


9  •  Ar 


dx{k) 


\ Ar  ■  53  ( 2Sigi  If  ■ H(gi) + s,8‘ 


i=i 


2  dH(gj)  d& 

dgi  dx 


(40) 


dL(k )  dm 
du(k) 


h2  9  (lEL, SiSfHCgf))  ■  AT 


du(k)  du(k) 

6 


=  (3  a3u2(/<)  +  2a2u(k) 


+  a,)+UT.Yj(2sigidgi  —  — 2  9H(a)  9& 


1  =  1 


du  H(gi)  +  S^  9g,  dll 


(41) 


H(gj)  is  chosen  to  be  a  sigmoid  function  (instead  of  a  discontinuous 
step  function): 


H(&) 


1 

1  +exp(ogj)’ 


(42) 


since  the  function  must  be  differentiable  for  the  generalized  delta 
rule  to  be  applicable,  and  hence  must  provide  a  smooth  transition 


Table  2 

Parameters  of  the  electric  vehicle. 


Parameter 

Symbol 

Value 

Total  mass 

M 

2049  kg 

Air  drag  coefficient 

cd 

0.31 

Frontal  area 

Af 

2  m2 

Air  density 

Ra 

1 .23  kg  m~3 

Coefficient  of  rolling  resistance 

fr 

0.01 

Acceleration  due  to  gravity 

g 

9.8  ms-2 

Table  3 

Parameters  of  the  FCHPS. 

Parameter 

Symbol 

Value 

Maximum  FCS  power 

^FCmax 

50  kW 

Minimum  FCS  power 

Ppc  ■ 

5  kW 

Maximum  FCS  rising  power  rate 

APpe.rise 

5  kW  s-1 

Maximum  FCS  falling  power  rate 

APFC.fall 

-5  kW  s-1 

Internal  resistance  of  ultracapacitor 

Rue 

0.035  Q 

Capacitance  of  ultracapacitor 

Gc 

52  F 

Maximum  ultracapacitor  energy 

Emax 

1625  kj 

Maximum  SoC  of  ultracapacitor 

SoCmax 

0.95 

Minimum  SoC  of  ultracapacitor 

SoCmjn 

0.25 

Efficiency  of  boost  converter 

0.95 

Efficiency  of  buck-boost  converter 

m 

0.95 

Fig.  10.  Velocity  time  course  and  corresponding  power  demands  of  the  NEDC. 


between  legal  and  illegal  regions  of  inequality  constraints  so  that 
it  is  differentiable  at  critical  point  gj  =  0.  Larger  value  of  a  for  Eq. 
(42)  leads  to  a  steeper  slope  in  transition  region  and  hence  better 
approximation  to  the  ideal  case  as  defined  in  Problem  2.  Fig.  8  shows 
a  plot  of  this  function  for  a  =  50  used  in  this  study. 

3.6.  Neural  network  architecture  for  the  AOC-EMS  system 

In  the  AOC-EMS  system,  the  associated  neural  networks  are 
implemented  with  RBF  networks  [30].  Fig.  9  shows  the  architec¬ 
ture  of  RBF  networks  in  which  each  hidden  neuron  hj  is  modelled 
as  a  Gaussian  function: 

hj{x)  =  exp  1 — ^  ,  (43) 


Table  4 

Fuzzy-logic  rules  for  the  EMS. 


Fess 

Pl 

NB 

NS 

z 

PS 

PB 

SoC 

Z 

NB 

NB 

NB 

NS 

Z 

PS 

NB 

NS 

Z 

PS 

PB 

PB 

Z 

Z 

Z 

PS 

PB 

P,  positive:  N,  negative;  B,  big;  S,  small;  Z,  zero. 


Fig.  11.  Charge-discharge  contour  plot  of  the  ultracapacitor  bank  generated  by  the 
fuzzy-logic  rules. 
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Fuzzy-EMS  ,  NEDC 


Fig.  12.  Trajectories  of  energy  management  obtained  by  the  fuzzy-EMS  for  the  FCHPS. 


where  x  denotes  the  input  vector  of  the  RBF  network,  Cj  denotes 
the  centre  vector  and  bj  is  the  width  of  the  Gaussian  func¬ 
tion.  The  functional  mapping  of  an  RBF  network  can  be  written 
as 

N h 

y  =  y ~Jwjhj(x)  =  Wt/ijCx)  +  w2h2(x )  +  •  •  ■  +  wNhhNh(x),  (44) 

j=i 


where  Wj  is  a  vector  of  synaptic  weights  connecting  the  j-th  hidden 
neuron  to  output  vector  y.  The  partial  derivatives  pertaining  to  the 
RBF  network  are  as  follows: 


m,  =  h>m- 

(45) 

dyb  -  Wi  ■  ¥9  ■  b3J  . 

(46) 

AOC-EMS  ,  NEDC 


0  200  400  600  800  1000 


Fig.  13.  Trajectories  of  energy  management  obtained  by  the  AOC-EMS  for  the  FCHPS. 
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dy  .  (x-Cj)T 

Wj.hj(xy_^_ 

J  1 


(47) 


dy 

dx 


N, 


y  v — -v 

j  =  -23  '  hJ(x) 


j=i 


i.X-Cj)T 

bf 


(48) 


These  partial  derivatives  can  be  used  to  derive  the  gradient  quanti¬ 
ties  such  as  du{k\w)ldx{k)  and  dX{k\a)/da  for  the  AOC-EMS  system. 


4.  Validations 


The  effectiveness  of  the  AOC-EMS  system  in  managing  the 
energy  usage  of  an  electric  vehicle  powered  by  an  FCHPS  was  inves¬ 
tigated.  Table  2  lists  the  model  parameters  of  the  electric  vehicle. 
The  FCHPS  consisted  of  a  50-kW  FCS  and  a  bank  of  ultracapacitors. 
With  the  power-train  parameters  listed  in  Table  3,  the  electric  vehi¬ 
cle  was  capable  of  accelerating  from  0  to  1 00  km  in  1 0  s  if  the  initial 
SoC  of  the  ESS  was  greater  than  70%.  The  operational  limitations  of 
the  power  devices,  stated  as  [min,  max],  were  as  follows:  [5, 50]  kW 
for  the  fuel-cell  power,  [-5,  5]  kW  s_1  for  the  power-variation  rate 
of  the  fuel  cell  and  [0.2,  0.95]  for  the  SoC  of  the  ESS  bank. 

The  AOC-EMS  system  was  tested  in  five  standard  driving 
cycles:  the  New  European  Driving  Cycle  (NEDC),  the  UDDS  (Urban 
Dynamometer  Driving  Schedule),  the  Highway  Fuel  Economy  Test 
(HWFET),  Aggressive  Driving  Cycle  (US-06),  and  the  FTP  (Federal 
Test  Procedure).  Fig.  10  illustrates  the  velocity  time  course  for  the 
NEDC  and  the  corresponding  power  demands. 

For  comparison,  a  fuzzy-logic-based  EMS  (fuzzy-EMS)  with  lin¬ 
guistic  rules  constructed  by  an  expert  was  also  investigated  for 
the  same  driving  cycles.  Depending  on  the  load  power,  PL(k)  and 
So C(/<)  of  the  ESS,  the  fuzzy-EMS  performed  a  linguistic  inference 
based  on  the  rules  listed  in  Table  4  to  generate  the  correct  power- 
split  command  of  the  ESS  bank.  The  principal  concept  underlying 
the  applied  fuzzy  rules  was  ensuring  that  the  fuel  cell  satisfied  the 
power  demands  while  maintaining  the  SoC  of  the  ESS  bank  within 
the  permitted  range.  The  charge-discharge  contour  plot  of  the  ESS 
bank  generated  by  the  fuzzy-EMS  is  shown  in  Fig.  11. 

Figs.  1 2  and  1 3  show  the  test  results  for  the  fuzzy-EMS  and  AOC- 
EMS,  respectively.  A  fair  comparison  was  ensured  by  requiring  each 
test  case  to  meet  the  condition  that  the  final  SoC  after  completing 
a  cycle  was  within  ±0.5%  of  the  initial  SoC.  The  total  hydrogen  con¬ 
sumption  over  the  NEDC  was  129.9g  for  the  fuzzy-EMS  and  122g 
for  the  AOC-EMS.  The  results  show  that  the  AOC-EMS  not  only  min¬ 
imized  the  hydrogen  consumption  but  also  significantly  reduced 
both  the  peak  power  and  the  power-variation  rate  of  the  fuel 
cell. 

The  results  of  applying  the  designed  strategy  in  the  five  driving 
cycles  are  compared  with  those  for  the  electric  vehicle  powered  by 
a  fuel  cell  alone  in  Fig.  14.  The  average  improvement  in  hydrogen 
consumption  was  23.3%  for  the  AOC-EMS  and  18.7%  for  the  fuzzy- 
EMS. 

The  performance  of  the  synthesized  AOC-EMS  was  evaluated 
further.  Fig.  15  illustrates  the  fuel-cell  power  distribution  with 
the  AOC-EMS  for  three  types  of  driving  cycle.  In  the  NEDC,  which 
consists  of  frequent  accelerations  and  decelerations,  most  of  the 
fuel-cell  power  was  used  in  the  low-power  region,  meaning  that 
the  ultracapacitor  bank  can  greatly  minimize  the  burden  on  the 
FCS  in  an  urban-like  driving  pattern.  In  the  HWFET  the  vehicle  is 
driven  in  a  highway-like  pattern,  which  leads  to  a  higher  sustained 
demand  for  electrical  power;  the  results  show  that  the  power  dis¬ 
tribution  shifted  towards  the  high-efficiency  region  of  the  fuel-cell 
efficiency,  corresponding  to  a  higher  average  power.  Under  the  US- 
06  test,  which  contains  both  urban-like  and  highway-like  driving 
patterns,  the  fuel-cell  power  distribution  was  concentrated  in  the 
region  characterized  by  extremely  low  hydrogen  consumption  and 


NEDC  UDDS  HWFET  US-06  FTP 


Fig.  14.  Comparison  of  the  hydrogen  consumptions  in  five  standard  driving  cycles. 


Fuel-cell  net  power  /  kW 


Fig.  15.  Fuel-cell  net-power  distribution  with  AOC-EMS  applied  in  the  NEDC, 
HWFET  and  US-06. 

highest  efficiency.  These  observations  confirm  that  the  AOC-EMS 
is  indeed  effective  in  minimizing  fuel  consumption  under  various 
types  of  driving  patterns. 

5.  Conclusions 

This  work  formulated  the  complex  requirements  for  the  energy 
management  of  the  FCHPS  as  a  constrained  optimal-control  prob¬ 
lem.  The  penalty-function  method  was  used  to  systematically 
transform  the  (tedious)  constrained  problem  into  an  unconstrained 
problem.  The  results  have  shown  that  the  AOC-EMS  system  is  able 
to  synthesize  an  optimal  EMS  through  reinforcement  learning.  The 
AOC-EMS  system  requires  a  pretraining  procedure  to  obtain  con¬ 
vergent  weights  for  each  neural  network.  In  the  pretraining  phase, 


W.-S.  Lin,  C.-H.  Zheng /  Journal  of  Power  Sources  196  (201 1 )  3280-3289 


3289 


the  full  pattern  of  a  driving  cycle  should  be  presented  to  this  sys¬ 
tem  so  that  all  possible  driving  situations  are  taken  into  account.  To 
execute  the  EMS,  convergent  pretraining  should  produce  an  action 
network  that  constitutes  an  approximate  optimal  strategy  of  the 
optimal-control  problem.  The  pretrained  action  network  can  be 
applied  in  a  stochastic  real-time  execution  without  prior  knowl¬ 
edge  of  future  driving  patterns.  The  AOC-EMS  system  can  also  fine 
tune  the  trained  action  network  through  online  learning  in  order 
to  adapt  to  a  real-world  environment. 
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